Conversation
| # ---------- Helpers ---------- | ||
|
|
||
|
|
||
| def get_support(itemset, transactions): |
There was a problem hiding this comment.
Good extraction of get_support() as a standalone helper — makes the
logic much easier to test in isolation. However, it's missing type
hints and a docstring. Suggestion:
def get_support(itemset: frozenset, transactions: list[set]) -> int:
"""Return the number of transactions containing the itemset."""
return sum(1 for t in transactions if itemset.issubset(t))
| for t in transactions: | ||
| for c in candidates: | ||
| if c.issubset(t): | ||
| candidate_counts[c] += 1 |
There was a problem hiding this comment.
Using defaultdict(int) here is cleaner than the old counts = [0] *
len(itemset) approach — no more index tracking with enumerate().
One suggestion: the support counting loop (lines 93-96) could use
the new get_support() helper you defined earlier to avoid duplication
and keep the main apriori() function cleaner:
frequent = {
c: get_support(c, transactions)
for c in candidates
if get_support(c, transactions) >= min_support
}
There was a problem hiding this comment.
Actual logic
The actual implementation is optimal:
for t in transactions:
for c in candidates:
if c.issubset(t):
candidate_counts[c] += 1
It passes over each transactions once, count all candidates at once, and avoid repeated scans, which make it algorithmically better.
Your suggestion
frequent = {
c: get_support(c, transactions)
for c in candidates
if get_support(c, transactions) >= min_support
}
It calls the get_suppor() twice per candidate, which literally double the cost:
- once in if
- once in the value
So it improves readability, but definitely not the performance.
Aligned with your suggestion
candidate_counts = {}
for c in candidates:
support = get_support(c, transactions)
if support >= min_support:
candidate_counts[c] = support
Here we computes support only once avoiding duplications, but we keep the logic readable. This way, we assure readability and performance.
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Describe your change:
Checklist: